ProvStore: A Public Provenance Repository

نویسندگان

  • Trung Dong Huynh
  • Luc Moreau
چکیده

ProvStore is the first online public provenance repository supporting the new PROV standards by W3C. It allows users and applications to store and (optionally) publish the provenance of their data on the Web. Provenance documents can be transformed, visualized, and shared in various serializations, with all the functionality also available to automated applications via a RESTful API (OAuth supported). 1 Provenance Repository ProvStore (available online at provenance.ecs.soton.ac.uk/store) is the first public repository of provenance supporting the PROV standards for provenance on the Web by the World Wide Web Consortium [MM13]. It provides free accounts to registered users, allowing them to upload and share provenance documents either privately or publicly in various representations (see Figure 1 for an example). Specifically, it supports the Provenance Notation (PROV-N), RDF encoded using the PROV Ontology (PROV-O) in Turtle or TriG formats, PROV-XML, and PROV-JSON [HJK13]. By default, documents submitted to ProvStore are private and can only be accessed by their owners. Document owners, however, can choose to share their documents with others in two ways: making a document public, i.e. available to any visitor to ProvStore, or sharing it with specific ProvStore’s users. The former is useful for users who want to expose the provenance of their resources (e.g. papers, reports, data sets) to the public; the link to a document on ProvStore can be attached as the provenance URI along with the corresponding resource.1 In the latter, different access roles can be set to authorized users for fine-grain access control: administrator, editor, contributor, or reader. Except reader, all other roles and the owner can append new provenance bundles to a document after it has been created. It is suitable for sharing provenance between a collaborating team of humans and/or applications (see Section 3 for more information about ProvStore’s application programming interface). ? ProvStore was funded by the Engineering and Physical Sciences Research Council (EPSRC) as part of project ‘Orchid’, grant EP/I011587/1. 1 See www.w3.org/TR/prov-aq for more information on provenance access and query. Document links on ProvStore support HTTP content negotiation. For example, if the HTTP request specify a header Accept: application/json, the PROV-JSON representation of the provenance document will be returned. Fig. 1. The screenshot of a ProvStore document. On each document (Figure 1), users can see its provenance descriptions in PROV-N, along with some statistics about the numbers of assertions. ProvStore also provides a number of provenance network metrics [EHM12] calculated on the graph representation of the document. As mentioned above, access links to various provenance representations are included, in addition to a numbers of provenance transformations and visualizations (see Section 2). The provenance validity of the document can be checked directly from inside the document page (provided by the external ProvValidator service2). 2 Provenance Transformation and Visualization A provenance document can contain bundles, which are a PROV construct to support bundling a set of provenance descriptions (so allowing provenance of provenance to be expressed) [MM13]. To support relating provenance statements within a document across its bundles, ProvStore can produce a flattened representation of the document in which all of its provenance statements are merged into a flat document. In this representation, the provenance of entities distributed in multiple bundles can be “connected” for further examination. In addition to the flattened representation, ProvStore provides a number of provenance views: Data Flow (concerned with the flow of information or the 2 provenance.ecs.soton.ac.uk/validator transformations of things), Process Flow (concerned with the processes that took place), and Responsibility (assigning responsibility for what happened) [MG13, Ch. 3]. These views are simplified versions of the original document produced by selecting only the relevant provenance descriptions from it. They can facilitate the examination of provenance information by allowing users to focus on a single aspect of it rather than the full descriptions. Each of the views can be applied either on the original document or on its flattened version. All versions (original or flattened, optionally simplified in a provenance view) of a ProvStore document can be visualized in a (static) graphical representation (in the SVG format). In addition, ProvStore provides interactive visualization tools for users to explore a provenance graph through a Hive plot (highlighting input, output, and intermediary nodes), a Wheel plot (showing the density of connections to/from nodes), and a Gantt chart (presenting entities, activities, and agents on a time line). The Hive and Wheel plots also allow filtering on provenance assertion types to simplify the visualizations. 3 RESTful Application Programming Interface (API) All of the functionality described in the previous sections (with the exception of interactive features like validation and visualizations) can be accessed programmatically via a RESTful API3 over the Hypertext Transfer Protocol. ProvStore, hence, can serve as a provenance storage-and-publish service on the cloud, providing applications a means to make the provenance of their data available online as soon as it is generated/recorded. Authorized applications must authenticate with ProvStore’s API either by using their secret API keys or by following the OAuth (version 1) protocol. With the latter, ProvStore enables users of any third-party applications registered with it to seamlessly access the users’ provenance data directly from inside such applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Secure Provenance for Data Preservation Repositories

Importance of research data preservation and management has been accepted by the scientists all around the world. Interest and investment in data preservation projects has become higher than ever before. Already there are number of wellknown research data repositories for different types of research data. Data preservation, sharing, discovery and reuse are the key features which are common acro...

متن کامل

Managing Provenance in Scientific Workflows with ProvManager

Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow ac...

متن کامل

Provenance Information in Biomedical Knowledge Repositories - A Use Case

We present a use case for provenance information in biomedical knowledge repositories designed to support applications including information retrieval and knowledge discovery. We show that information about the knowledge sources from which statements are extracted must be recorded in addition to the statement themselves in order to support these applications. While the storage and processing of...

متن کامل

Provenance Storage, Querying, and Visualization in PBase

We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository pro...

متن کامل

Start Smart and Finish Wise: The Kiel Marine Science Provenance-Aware Data Management Approach

While creating or processing scientific data, it is very important to capture and to archive the corresponding provenance data. “Start smart and finish wise” is our approach for a provenance aware tooling, which helps data managers and scientists not only to manage their data, but also to capture their scientific data in the field, to record the provenance data, to store it for further analysis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014